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Abstract 

The problem of segmenting a given image into coherent regions is import- 
ant in Computer Vision and many industrial applications require segmenting 
a known object into its components. Examples include identifying individual 
parts of a component for process control work in a manufacturing plant and 
i—i identifying parts of a car from a photo for automatic damage detection. Un- 

fortunately most of an object's parts of interest in such applications share 
J> the same pixel characteristics, having similar colour and texture. This makes 

segmenting the object into its components a non-trivial task for conventional 
image segmentation algorithms. In this paper, we propose a "Model Assisted 
Segmentation" method to tackle this problem. A 3D model of the object is 
registered over the given image by optimising a novel gradient based loss func- 
tion. This registration obtains the full 3D pose from an image of the object. 
The image can have an arbitrary view of the object and is not limited to a 
particular set of views. The segmentation is subsequently performed using 
a level-set based method, using the projected contours of the registered 3D 
^ model as initialisation curves. The method is fully automatic and requires no 

user interaction. Also, the system does not require any prior training. We 
present our results on photographs of a real car. 

Keywords. Image segmentation; 3D-2D Registration; 3D Model; Monocular; 
Full 3D Pose; Contour Detection; Fully Automatic. 
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1 Introduction 

Image segmentation is a fundamental problem in computer vision. Most standard 
image segmentation techniques rely on exploiting differences between pixel regions 
such as color and texture. Hence, segmenting sub-parts of an object which have 
similar characteristics can be a daunting task. We propose a method that performs 
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such sub-segmentation and does not require user interaction or prior training. A 
result from our method is shown in Figure 1 with the car sub-segmented into a 
collection of parts. This includes the hood of the car, windshield, fender, front and 
back doors/windows. 

Many industry applications require an image of a known object to be sub- 
segmented and separated into its parts. Examples include identification of indi- 
vidual parts of a car given a photograph for automatic damage identification or 
the identification of sub-parts of a component in a manufacturing plant for process 
control work. Sub-segmenting parts of an object which share the same color and 
texture is very hard, if not impossible, with conventional segmentation methods. 
However, prior knowledge of the shape of the known object and its components can 
be exploited to make this task easier. Based on this rationale we propose a novel 
Model Assisted Segmentation method for image segmentation. 

We propose to register a 3D model of the known object over a given photo- 
graph/image in order to initialise the segmentation process. The segmentation is 
performed over each part of the object in order to obtain sub-segments from the 
image. A major contribution of this work is a novel gradient based loss function, 
which is used to estimate the full 3D pose of the object in the given image. The 
projected parts of the 3D model may not perfectly match the corresponding parts 
in the photo due to dents in a damaged vehicle or inaccuracies in the 3D model. 
Therefore, a level-set [11] based segmentation method is initialised using initial con- 
tour information obtained by projecting parts of the 3D model at this 3D pose. 
We focus our work on sub-segmentation of known car images. Cars pose a difficult 
segmentation task due to highly reflective surfaces in the car body. The method can 
be adapted to work for any object. 

The remainder of this paper is organised as follows. Previous work related to our 
paper is described in Section 2. We describe the method used to estimate the 3D 
pose of the object in Section 3. The contour based image segmentation approach 
is described next in Section 4. This is followed by results on real photos which are 
benchmarked against state of the art methods in Section 5. 
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2 Related Work 



Model based object recognition has received considerable attention in computer 
vision. A survey by Chin and Dyer [5] shows that model based object recognition 
algorithms generally fall into three categories, based on the type of object representa- 
tion used - namely 2D representations, 2.5D representations and 3D representations. 

2D representations [18, 28] aim to identify the presence and orientation of a specific 
face of 3D objects, for example parts on a conveyor belt. These approaches require 
prior training to determine which face to match to, and are unable to generalise to 
other faces of the same object. 

2.5D approaches [19, 8, 7] are also viewer centred, where the object is known to 
occur in a particular view. They differ from the 2D approach as the model stores 
additional information such as intrinsic image parameters and surface-orientation 
maps. 

3D approaches are utilised in situations where the object of interest can appear in a 
scene from multiple viewing angles. Common 3D representation approaches can be 
either an 'exact representation' or a 'multi-view feature representation'. The latter 
method uses a composite model consisting of 2D/2.5D models for a limited set of 
views. Multi-view feature representation is used along with the concept of general- 
ised cylinders by Brooks and Binford [3] to detect different types of industrial motors 
in the so called ACRONYM system. The models used in the exact representation 
method, on the contrary, contain an exact representation of the complete 3D object. 
Hence a 2D projection of the object can be created for any desired view. Unfortu- 
nately, this method is often considered too costly in terms of processing time. The 
2D and 2.5D representations are insufficient for general purpose applications. For 
example, a vehicle may be photographed from an arbitrary view in order to indicate 
the damaged parts. Similarly, the 3D multi-view feature representation is also not 
suitable, as we are not able to limit the pose of the vehicle to a small finite set of 
views. Therefore, pose identification has to be done using an exact 3D model. Little 
work has been done to date on identifying the pose of an exact 3D model from a 
single 2D image. 

Image gradients. Gray scale image gradients have been used to estimate the 3D 
pose in traffic video footage from a stationary camera by Kollnig and Nagel [10]. 
The method compares image gradients instead of simple edge segments, for bet- 
ter performance. Image gradients from projected polyhedral models are compared 
against image gradients in video images. The pose is formulated using three degrees 
of freedom; two for position and one for angular orientation. Tan and Baker [27] use 
image gradients and a Hough transform based algorithm for estimating vehicle pose 
in traffic scenes, once more describing the pose via three degrees of freedom. Pose 
estimation using three degrees of freedom is adequate for traffic image sequences, 
where the camera position remains fixed with respect to the ground plane. This 
approach does not recover the full 3D pose as in our method. 
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Feature-based methods [6, 15] attempt to simultaneously solve the pose and point 
correspondence problems. The success of these methods are affected by the quality 
of the features extracted from the object, which is non-trivial with objects like cars. 
Features depend on the object geometry and can cause problems when recovering 
a full 3D pose. Also different image modalities cause problems with feature based 
methods. For example reflections which may appear as image features do not occur 
in the 3D model projection. Our method on the contrary, does not depend on feature 
extraction. 

Segmentation. The use of shape priors for segmentation and pose estimation have 
been investigated in [22, 21, 23, 25]. These methods focus on segmenting foreground 
from background using 3D free-form contours. Our method, on the contrary, does 
intra-object segmentation (into sub-segments) by initialising the segmentation using 
projections of 3D CAD model parts at an estimated pose. In addition, our method 
works on more complex objects like real cars. 

3 3D Model Registration 

We describe the use of a featureless gradient based loss function which is used to 
register the 3D model over the 2D photo. Our method works on triangulated 3D 
CAD models with a large number of polygons (including 3D models obtained from 
laser scans) and utilises image gradients of the 3D model surface normals rather 
than considering simple edge segments. 

Gradient based loss function. We define a gradient based loss function that has 
a minimum at the correct 3D pose 6 £ -K 7 where the projected 3D model matches 
the object in the given photo/image. The image gradients of the 3D model surface 
normal components and the image gradients of the 2D photo are used to define a 
loss function at a given pose 6. 

We use (u,v) G /Z 2 to denote 2D pixel coordinates in the photo/image and 
(x,y,z) G 1R 3 to denote 3D coordinates of the 3D model. Let W be a d dimensional 
matrix (for example d = 3 if W is an RGB image) with elements W(u,v) GiR d . We 
define the k norm 'gradient magnitude' matrix of W as 

\\VW(u,v)\\t := Eti (l^| fc + l^| fc ) (1) 

Based on this we have the gradient magnitude matrix Gj for a 2D photo/image J as 

G I (u,v) = \\VI(u,v)\\ k k (2) 

Let <f>(x,y,z,0) = (<f) x 4> y 4> Z ) T G M 3 be the unit surface normal at the 3D point 
p— (x,y,z) for the 3D model at pose 0. The model is rendered with the surface 
normal components values (f> x , <p y and <p z used as RGB color values in the OpenGL 
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renderer to obtain the projected surface normal component matrix <fr such that 
<&(u,v,6) G 1R 3 has surface normal component values at the 2D point (u,v) in the 
projected image. Based on this we have the gradient normal matrix for the surface 
normal components as 

G rf (0)(u,v) = \\V*(u,v,0)\\ k k (3) 

The loss function L g {0) for a given pose is defined as 

L g (0) := 1 - (corr(G N (0), G T )) 2 e [0, 1] (4) 

where corr(GN(0),Gz) is the Pearson's product-moment correlation coefficient [20] 
between the matrix elements of Gn(0) and Gj. This loss has a convenient property 
of ranging between and 1. Lower loss values imply a better 3D pose. 

Visualisation. We illustrate intermediate steps of the loss calculation for a 3D 
model of a Mazda 3 car. The surface normal components $ x (u,v,6) & y (u,v,0) and 
<!> z (u,v,0) are shown in Figure 2(a-c). Their image gradients are shown in Figure 
2(d-i) and the resulting Gn(0) matrix image is shown in Figure 2(j). Similarly 
intermediate steps in the calculation of Gj are show in Figure 3 for a real photo 
and a synthetic photo. We show overlaid images of Gn(0) and Gj at the known 
matching pose in Figure 4. We show how the overlap changes by applying 2 levels 
of Gaussian smoothing (described below) in Figures 4 for the real and synthetic 
photo. The synthetic photos were made by projecting the 3D model at a known 
pose 0. 

The correlation will be highest in Equation 4 when the 3D model is projected 
with pose parameters 0o that match the object in the photo F, as this has the best 
overlap. Therefore the loss will be lowest at the correct pose parameters O , for 
values of reasonably close to O . We see this in the loss landscapes in Figure 6. 

Gaussian smoothing. We do Gaussian smoothing on the photo and rendered 
surface normal component images before calculating Gi (Equation 2) and Gn(0) 
(Equation 3). This is done by convolving with a 2D Gaussian kernel followed by 
down-sampling [7] . This makes the loss function landscape less steep and noisy, thus 
making it easier to optimise. However, the global optimum tends to deviate slightly 
from the correct pose at high levels of Gaussian smoothing. Compare the ID loss 
landscapes shown in Figure 6 for different levels of Gaussian smoothing n. Therefore, 
we do a series of optimisations starting from the highest level of smoothing, using 
the optimum found at level n as the initialisation for level n — 1, recursively. 

Choosing the norm k. We have a choice when selecting the norm for Equations 
2 and 3. Having tested both 1-norm and 2-norm cases we have found the 1-norm to 
be less noisy (as shown in Figure 6) and hence easier to optimise. 
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(i) ™^ (j) Giv(0) 

Figure 2: The visualisations shows Gn(6) for a 3D model in (j). The x,y and z 
component matrices of the surface normal vector are shown in (a)-(c). 
Their image gradients are shown in (d)-(i). The resulting Gjv(^) matrix is shown 
in (j). No Gaussian smoothing has been applied. 

Colour representation: green=positive, black=zero and red=negative. We use a 
horizontal x axis pointing left to right, vertical y axis and pointing top to bottom 
and an z axis which points out of the page. 
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(a) Real photo (b) Synthetic photo 




(g) Real Gi (h) Synthetic Gj 



Figure 3: Intermediate steps in calculating Gj for a real (column 1) and synthetic 
photo (column 2). The synthetic photo was made by projecting the 3D model. 
Image gradients (rows 2 and 3) and Gj (row 4) are shown. Colour representation: 
green=positive, black=zero and red=negative. 
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(a) Real (b) n=0 (c) n=2 




(d) Synthetic (e) n=0 (f) n=2 



Figure 4: Overlaid images of Gj and Gn(0) for a real photo (row 1) and a synthetic 
photo (row 2) obtained by rendering a 3D model are shown. The first column shows 
the photos I. The overlaid images of Gi and Gn{&) with no Gaussian smoothing 
(column 2) and 3 levels of Gaussian smoothing (column 3) are shown. The photo 
is in the green channel and 3D model is in the red channel, with yellow showing 
overlapping regions. 

Initialisation. We use a rough pose estimate to seed the optimisation. An 
object specific method can be used to obtain the rough pose. Possible methods for 
obtaining a coarse initial pose include the work done by [17], [26] and [1]. We have 
used the wheel match method developed by Hutter and Brewer [9] to obtain an 
initial pose for vehicle photos where the wheels are visible. The wheels need not be 
visible with the other methods mentioned above. We use the following to represent 
the rough pose of CclXS clS prescribed in [9] which neglects the effects of perspective 
projection. 

6' := (fi x ,n y , 8 X , 8 y , ip x ,ipy) (5) 

l-i = (fi x ,fi y ) is the visible rear wheel center of the car in the 2D image. 5 = (8 x ,8 y ) 
is the vector between corresponding rear and front wheel centres of the car in the 
2D image. The 2D image is a projection of the 3D model on to the XY plane. 
if) = (ip x ,'if)y,if) z ) is a unit vector in the direction of the rear wheel axle of the 3D car 
model. Therefore, ip z = — \/l — ip x — ipy and need not be explicitly included in the 
pose representation 0. This representation is illustrated in Figure 5. 

We include an additional perspective parameter / (the distance to the camera 
from the projection plane in the OpenGL 3D frustum) when optimising the loss 
function to obtain the fine 3D pose. Hence we define the full 3D pose as follows. 

:= (fx x , /iy, 5 X , 8 y , ip x , ip y , f) (6) 

0' is converted to translation, scale and rotation as per [9] to transform the 3D 
model and along with / is used to render the 3D model with perspective projection 
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Figure 5: We illustrate components of the pose representation 6' (Equation 5) used 
for 3D models of cars. We use the rear wheel center /j,, the vector between the wheel 
centres 8 and unit vector t/> in the direction of the rear wheel axle. 



in OpenGL using pose 6. Thereby, we estimate the full 3D pose by minimizing 
Equation 4 w.r.t 6. Intrinsic camera parameters need not be known explicitly. Note 
that any other choice of pose parameters would do. We use the above as it is 
convenient with cars. 

Background removal. As the effects of the background clutter in the photo 
adds considerable noise to the loss function landscape we use an adaptation of the 
Grab cut [24] method to remove a considerable amount of the background pixels from 
the photo. Although, this does not result in a perfect removal of the background it 
significantly improves the pose estimation results. The initial rough pose estimate is 
used as a prior to generate the background and foreground grabcut masks 1 . Figure 
7(b) shows results of the background removal. 

Optimisation. We use the downhill simplex optimiser [16] to find the pose para- 
meters 0q which give the lowest loss value for Equation 4. This optimiser is very 
robust and is capable of moving out of local optima by reinitialising the simplex. 
Downhill simplex does not require gradient calculations. Gradient based optimisers 
would be problematic given the loss landscapes in Figure 6. We use the fine pose 
obtained thus to register the 3D model on the 2D photo. This is used to initialise 
contour detection based image segmentation. 



4 Contour Detection 

In this section, we discuss the procedure of contour detection used to segment the 
known object in the image. We use a variation of the level set method which does 
not require re-initialisation [11] to find boundaries of relevant object parts. 

Most active contour models implement an edge-function to find boundaries. 
The edge-function is a gradient dependant positive decreasing function. A common 
formulation is as follows 

9(|vj|) = i + |v<Ujk pS1 ' (7) 

1 We use the cv::grabCut() method provided in OpenCV[2] version 2.1 
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-15 -10 -5 5 10 15 -15 -10 -5 5 10 15 

Percentage shift in x direction from known pose Percentage shift in x direction from known pose 

(a) 2-norm (b) 1-norm 

Figure 6: We compare 1-norm and 2-norm loss landscapes obtained by shifting the 
3D model along the x direction from a known 3D pose. The horizontal axis shows 
the percentage deviation along the x axis. The numbers in the legend show the 
level of Gaussian smoothing n applied on the gradient images before calculating the 
loss in Equation 4. We note that the 1-norm loss is less noisy compared to the 
2-norm loss. The actual loss function is seven dimensional and graphs of the other 
dimensions are similar. 



where G^Cg)/ denotes a smoother version of 2D image J, G a is an isotropic 
Gaussian kernel with standard deviation a, and ® is the convolution operator. 
Therefore g(|VJ|) will be 0, as V/ approaches infinity, i.e. 

lim g(\VI\) = 0, whencr = 0. (8) 

|V/|— >oo 

As per [11], a Lipschitz function <fi is used to represent the curve 
C = {(u, v)\(/)q(u, v ) =0} such that , 

-p, (u, v) inside contour C 
4>o(u, v) — ^ 0, (u, v) on contour C (9) 
p, (it, v) outside contour C 

As with other level set formulations like [4] and [13], the curve C is evolved using 
the mean curvature div(V0/|V0|) in the normal direction |V0|. Therefore the 
curve evolution is represented by d<p/dt as 

g = |V0|(div(^(|V/|)^)+^(|V/|)), (1Q) 
0(0, u, v) = 4> {u, v) G [0, oo) x R 2 

where the evolution of the curve is given by the zero-level curve at time t of the 
function (f)(t, x, y). v is a constant to ensure that the curve evolves in the normal 
direction, even if the mean curvature is zero. 
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Theoretically, as the image gradient on an edge/boundary of an image segment 
tends to infinity, the edge function g (Equation 7) is zero on the boundary. This 
causes the curve C to stop evolving at the boundary (Equation 10). However, in 
practice the edge function may not always be zero at image boundaries of complex 
images and the performance of the level set method is severely affected by noise. Iso- 
tropic Gaussian smoothing can be applied to reduce image noise but over smoothing 
will also smooth the edges, in which case, the level set curve may miss the boundary 
altogether. This is a common problem not only for the level set method in [11] but 
also for other active contour models [4, 14, 12, 13]. Additionally, the efficiency and 
effectiveness of level set in boundary detection depends a lot on the initialisation of 
the curve. Without appropriate initialisation, the curve is frequently trapped into 
local minima. 

A very close initialisation curve can eliminate this problem. In our approach, 
the initialisation curve is obtained by registering a 3D model over the photo as 
described in Section 3. Since the parts p in the 3D model are already known, they 
can be projected at the known 3D pose 6 to obtain a selected part outline o p in 2D. 
An 'erosion' morphological operator is applied on o p to obtain the initial curve 4>o iP 
which is inside the real boundary. 

The green curves (initialisation images in Figures 9, 10 and 11) are used to 
denote the 2D outlines of projected parts in the 3D model, while the red curves are 
the initialisation curves obtained by eroding these green curves. The level set starts 
with the initial curve <po, P to find actual boundary (p r , p in the 2D image of vehicle, 
for each part p. The yellow curves (result images in Figures 9, 10 and 11) indicate 
the actual boundaries detected. 

The entire process of 'Model Assisted Segmentation' is given in pseudo-code in 
Algorithm 1. 



Algorithm 1 Model Assisted Segmentation 

Input: Let I— Given image, M= Known 3D model 

Output: Segmentation curves <p rjP for selected model parts p 

1: 6' <r- Rough pose from i" 

2: I' <— Remove background in i" using 6' 

3: 0^0' 

4: for n = 2 down to do 

5: O^r- Optimise L g {6) on /' starting from using n levels of Gaussian smoothing 
6: end for 

7: for p e Selected parts in M do 

8: o p ^r- Outline of p projected using 

9: 0o,p^— Apply erosion operation on o p 
10: r ,p^— Output of level set on i" using o ,p as initial curve 
11: end for 
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5 Results 




(a) Photo 



(b) Background removed 




(c) Rough pose 



(d) Fine pose n=2 




(e) Fine pose n=l 



(f) Final fine pose n=0 



Figure 7: The images show pose estimation results for a real photograph of a Mazda 
Astina car. The original photograph and subsequent images have been cropped 
for clarity. The fine 3D pose in (f) is obtained by optimising the novel gradient 
based loss function (Equation 4) using the rough pose in (c). The rough pose is 
obtained as prescribed in [9]. Much of the background is removed (b) from the 
original photo (a) using an adaptation of 'Grabcut' [24] when estimating the fine 
3D pose. Intermediate steps of optimising the loss function with different levels of 
Gaussian smoothing n applied on the gradient images are shown in (d), (e) and (f). 
The close ups highlight the visual improvement during intermediate steps Figure 8. 



We apply our method to segment components of a real car from a photograph 
as follows. 

Pose estimation. The results of registering the 3D model over the photograph 
(pose estimation) are shown in Figure 7. A gradient sketch of the 3D model is 
drawn over the photograph in yellow to indicate the pose of the 3D model at each 
step in Figure 7. The wheels of the 3D model do not match the wheels in the photo 
due to the effects of wheel suspension. Since we are interested in segmenting parts 
of the car body the wheels have been removed from the 3D model for the fine pose 
estimation. The original photograph in Figure 7(a) shows the side view of a Mazda 
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(a) n=2 (b) n=l (c) n=0 



Figure 8: Close ups at each step of the optimisation (shown in Figure 7) for different 
levels of Gaussian smoothing n highlight the visual improvement in the 3D pose. 




(a) Initialisation (b) Result 



Figure 9: The figure shows the 'Model Assisted Segmentation' results for a real 
photo of a Mazda Astina car. The initialisation curves for a selection of car body 
parts are shown in 9(a) based on the fine 3D pose shown in Figure 7(f). The 3D 
model outlines are shown in 'green' and the initialisation curves obtained by eroding 
these outlines are shown in 'red'. The resulting segmentation is shown in 9(b). Close 
ups are shown along with benchmark results in Figure 10. 
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(e) Initialisation (f) Result (g) Benchmark - GC (h) Benchmark - LS 



Figure 10: Different close ups (row wise) for the results in Figure 9 are shown with 
the initialisation curves (column 1), our results (column 2) and benchmark results 
(columns 3 and 4). Our results are more accurate in general. Note the bleeding 
and false positives in the benchmark results. Our method is more accurate and 
sub-segments the image into meaningful parts. 

Astina car. We register a triangulated 3D model of the car obtained by a 3D laser 
scan. The rough 3D pose obtained using the wheel locations [9] is shown in Figure 
7(c). The result of the approximate background removal is shown in Figure 7(b). We 
optimise the gradient based loss function (Equation 4) for the image in Figure 7(b) 
with respect to the seven pose parameters (Section 3) to obtain the fine 3D pose. 
The optimisation is done sequentially moving from the highest level of Gaussian 
smoothing to the lowest. We start from the rough pose with two levels of Gaussian 
smoothing and obtain the pose in Figure 7(d). Next we use this pose to initialise an 
optimisation of the loss function with one level of Gaussian smoothing and obtain 
the pose in 7(e). Finally, we use this pose to perform one more optimisation with 
no Gaussian smoothing and obtain the final fine 3D pose shown in Figure 7(f). We 
note that the visual improvement in the image overlays gets smaller as we go up 
the Gaussian pyramid. However, the improvement in the 3D pose becomes more 
apparent when we compare the close ups in Figures 8(a), 8(b) and 8(c). 

Segmentation. Segmentation results based on contour detection for the photo- 
graph in 7(a) using the fine 3D pose (Figure 7(f)) are shown in Figures 9 and 10. 
The segmentation results for a selection of car parts (front and back doors, front and 
back windows, fender, mud guard and front buffer) are shown in Figure 9(b) by the 
yellow curves. The part boundaries obtained by projecting the 3D model are shown 
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(a) Initialisation (b) Result (c) Benchmark - LS (d) Benchmark - GC 




(e) Initialisation (f) Result (g) Benchmark - GC (h) Benchmark - LS 




(i) Initialisation (j) Result (k) Benchmark - GC (1) Benchmark - LS 



Figure 11: The figures show different close ups (row wise) for the results in Figure 
1. Initialisation curves (column 1), our results (column 2) and benchmark results 
(columns 3 and 4) are shown. We note that our results more accurate and has 
sub-segmented the car into meaningful components. 

in green and the initialisation curves are shown in red in Figure 9(a). For the sake 
of clarity we also include close ups of a few parts. The initialisation curves and the 
segmentation results for the back door and window are shown in Figures 10(a) and 
10(b), using the same color code. Close ups for the front parts are shown in Figures 
10(e) and 10(f). We see the high amount of reflection in the car body deteriorating 
the performance of the segmentation results in the latter case, especially around the 
hood of the car and windshield. In contrast the mud guard, lower parts of the buffer 
and fender are segmented out quite well in Figure 10(f) as there is less reflection 
noise in that region. Results for a semi-profile view of the car are shown in Figures 
1 and 11 using same convention. 

Accuracy. The accuracy of the results have been compared against a ground 
truth obtained from the photos by hand annotation in Table 1. We calculate the 
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Table 1: Accuracy of the sub-segmented parts measured against hand annotated 
ground truth. 



accuracy as 



a = l- ^^ R{ l v) 7 U t V)l (11) 



where Uf> and JJq are two binary images of the sub- segment at ion result and ground 
truth respectively. We note that the accuracy is considerably high. Also, the side 
view has a higher accuracy in general because the pose estimation gave a better 
result and hence the segmentation was better initialised. 



Benchmark tests. Our results from Model Assisted Segmentation were compared 
with state of the art image segmentation methods 'Grabcut (GC)' [24] and 'Level 
set (LS)' [11] which do not use any Model Assistance. A bounding box has been 
used initialise the benchmark methods. We compare our results (Figures 10(b) 
and 10(f)) with the benchmark tests in Figure 10. The segmentation using our 
method are more accurate in general. In addition to this, our method has the added 
advantage of sub-segmenting parts of the same object. This is a non-trivial task 
for conventional segmentation methods when the sub-segments of the object share 
the same colour and texture. In terms of overall performance, we observe that in 
our method the segmentation results 'bleed' a lot less into adjacent areas, unlike 
with the benchmark results. In terms of sub-segmenting parts of the same object, 
we see in Figure 10(f) that our method is capable of successfully segmenting out 
the fender, mud guard and the buffer from the front door unlike the benchmark 
methods. In fact it would be extremely difficult (if not impossible) to sub-segment 
parts of the front of the car which are painted the same color with conventional 
methods. Similarly the back door, back window and the smaller glass panel have 
been segmented out in Figure 10(b) where as the benchmark methods group them 
together. Results for a semi-profile view of the car are shown in Figure 1 with close 
ups and benchmark comparisons in Figure 11. Our results are better and separate 
the object into meaningful parts. 
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6 Discussion 



The Model Assisted Segmentation method described in this paper can segment parts 
of a known 3D object from a given image. It performs better than the state of the 
art and can segment (and separate) parts that have similar pixel characteristics. 
We present our results on images of cars. The highly reflective surfaces of cars 
make the pose estimation as well as the segmentation tasks more difficult than with 
non-reflective objects. 

We note that a close initialisation curve obtained from the 3D pose estimation 
significantly improves the performance of contour detection, and hence the image 
segmentation. However, the presence of reflections can deteriorate the quality of 
the results. We intend to explore avenues to make the process more robust in the 
presence of reflections. 
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